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This paper introduces a nonstationary model for images and 
develops an adaptive intrafield dpcm codec based upon the model. 
The codec attempts to minimize the mean-square coding error at 
each sample point in the picture. The quantizer in the resulting 
adaptive codec is found to be similar to that previously obtained from 
visual masking considerations. Comparative simulation results using 
256 X 256 pixel rasters are given for two- and three-bit/pixel versions 
of the adaptive codec, the three-bit/pixel Graham codec, and three- 
bit/pixel previous element dpcm. 

I. INTRODUCTION 

This paper introduces a nonstationary model for images and devel- 
ops an adaptive intrafield codec based upon the model. The codec 
adaptively estimates both the mean and probable range of values of 
the next picture sample to be encoded and adapts the predictor and 
quantizer accordingly. In so doing, the coder attempts to minimize the 
mean-square coding error (mmse) at each sample point in the picture. 
The mmse distortion measure is generally acknowledged to be a poor 
indicator of image quality. 1 However, when it is applied on a point 
(rather than area) basis in conjunction with the image model presented 
here, the coder adaptation and resulting coding quality are found to be 
comparable to that previously obtained from visual masking consid- 
erations. This result follows from a property of human vision, stressed 
by Graham, 2 concerning the strong connection that exists between 
image chaos (unpredictability) and the visual system's tolerance to 
noise-like coding distortion. Because of this property, we obtain good 
image quality at two bits per pel and excellent quality at three bits per 
pel in a dpcm codec designed solely using the mmse criterion— masking 
phenomena are in large part accounted for automatically when the 
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source model more adequately represents actual images and when the 
distortion criterion is applied on a point basis. 

II. SOURCE MODEL 

This section introduces a nonstationary causal source model for the 
intrafield video process that will be used to develop the adaptive 
predictive intrafield codec of Section IV. Motivation for the model is 
intuitive and follows from an examination of a representative video 
signal of the type we wish to encode, such as that shown in Fig. 1. This 
is a frame of two interlaced fields, each having 256 pixels per line and 
128 lines, with amplitudes stored as 8-bit quantities. The essential 
characteristic of this (and any) image is that it is a projective transfor- 
mation of a collection of physical objects. As a consequence, the image 
is partitioned into regions of luminance elements whose amplitudes 
are interrelated by the physical structure of the objects they represent. 
The result is an array of pixels composed of distinct regions having 
slowly varying "brightness" and "texture" with abrupt boundaries (the 
picture outline) separating one region from another. We find it natural 
to view this array as a field that is partitioned into regions of inde- 
pendent quasi-stationary subfields. Two underlying random phenom- 
ena are involved: the random amplitudes of picture elements within a 




Fig. 1 — Checker girl original. 
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given subfield and the random selection of the subfield with respect to 
raster coordinates. A source model that incorporates both phenomena 
is shown in Fig. 2. 

Figure 2 models the image generation process as a composite of Q 
autoregressive sources, q = 1,2, • • -, Q, and one white source, q = 0. 
Switches SI and S2 determine which source generates output lumi- 
nance s m „ = s,, where m and n are, respectively, the line number in the 
field and the column number of the pixel and t is the time the pixel is 
encountered during conventional line scanning. The autoregressive 
sources, characterized by predictors 1 through Q and "innovations" 
process w mn = w t , provide a set of Q possible processes from which the 
regions of slowly varying brightness and texture of a subfield in an 
actual image can be approximated. The random variables w, are 
assumed zero mean, independent, and characterized by a single known 
probability density function (pdf) g(w). The predictors F q in sources 
q = \ f 2, • • ■ , Q are taken to be linear functions of pixels from the local 
past neighborhood of coordinate (m, n) = t. Section IV discusses the 
specific predictors chosen for the codec of this paper (Table I). Source 
models those pixels of an actual image that either have no structural 
relation to previous pixels or whose relation to these pixels is not 
adequately modeled by sources 1 through Q. Such pixels tend to occur 
in highly chaotic regions of the image and at certain boundaries at 
which new subfields are initiated. Since this source represents the 
extreme of chaos possible in an image, its output is taken to be a 
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Fig. 2— Source model. Switches Si and S2 are governed by eq. (2) in the text. 
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sequence of random variables Fo ■ Fo(m, n), each uniformly distributed 
over [0, 255]. 

Switches SI and S2 of the model determine which source is used for 
final output at each raster coordinate and thus determine image outline 
as well as more subtle structural changes. "Outline" and "subtle 
structural changes" are subjectively perceived qualities of an image 
that are difficult to quantify probabilistically. However, in an actual 
image "structure" tends to vary slowly: boundaries between regions 
are an exception, but even here the discontinuity is generally only 
along one dimension. This suggests that probabilistic information 
regarding the source in operation at time t can be inferred by appro- 
priate processing of the pixels in the local past vicinity (in the same 
field) of the pixel in question. To arrive at a source model that 
characterizes this quasistationary in the simplest way, we model the 
image source as choosing sources, q = 0, 1, 2, • • • , Q, independently 
according to unknown first-order probabilities P[q\ (m, n)] that are: 
slowly varying functions of coordinates (m, n). We further assume that 
the Q textures generated by sources q = 1, 2, • • • , Q are a priori 
equally likely for a random choice of coordinate (m, n): 

E{P[q;(m,n)])=c q=l,2,...,Q (la) 

and 

E{P[Q\ (m, n)]} = e « c q = 0, (lb) 

where c and e are constants satisfying e + Qc = 1 and the expectation 
is taken over the raster coordinates. 

An alternative approach would be to model the sequence of q mn as 
stationary Markov. However, this approach was not taken since the 
assumption of nonstationary independent q leads to a relatively simple 
codec that is robust with respect to both varied picture inputs and 
channel errors. The assumption of equality of expectations in (lb) 
leads to mini-max performance with respect to variations in textural 
content of the picture to be encoded. Biasing this a priori distribution 
toward one predictor would make the codec more susceptible to poor 
performance on a picture which does not match this distribution. The 
problem faced by the codec is to estimate probabilities P[q; (m, n)] by 
suitable processing of past pixel outputs and use the estimates to best 
advantage for bandwidth compression. 

To summarize, the proposed model of the video process has the 
form (Fig. 2) 



St = 



Fo, with probability P(0; t) 

F q + wt, with probability P(q; t), l<q<Q, (2) 



where Fo is an independent random variable uniform over [0, 255], F q 
is a given linear function of pixels in the local past neighborhood of s t 
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(Table I), and w t is an independent zero mean random variable 
characterized by probability density function (pdf) g(w). Probabilities 
p(q f t) q = 0, 1, •••, Q vary slowly with respect to at least one 
coordinate of the raster and satisfy (1); otherwise, these probabilities 
are unknown. Note that the model embeds the elusive variety of gross 
image structure in the unknown probabilities P(q; t), < q < Q. These 
represent the probabilistic information that the encoder hopes to learn 
by suitable processing of past image source outputs. 

Figure 3 illustrates a representative output generated by the model. 
In obtaining this output, g(w) was assumed Laplacian, and the P{q\ t) 
were estimated from the image of Fig. 1 by a procedure described in 
Section III. Significant increases in structural similarity to Fig. 1 are 
possible by modeling the sequence of w t as nonstationary. In the 
interest of codec simplicity, however, this additional complexity is not 
included in the model. 

III. ANALYTICAL DEVELOPMENTS 

The design of the dpcm codec of Section IV requires specification of 
both the quantizer and the predictor. Complete statistical information 
pertinent to this design is contained in the conditional pdf of s, given 




Fig. 3— Representative output of source model. Output of the source model of Fig. 2, 
iere w, is Laplacian and probabilities P(q; t) of eq. (2) are estimated from Fig. 1. 



where 
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the set of past pixels {s t , t <t) = ST. This section describes how this 
conditional pdf can be etimated at the source output. 

It can be easily shown that, for given P(q\ t), q = 0, 1, • • •, Q, the 
probability density of s, conditioned on ST is (for < s t < 255) 



P(0- 1) Q 
p(s, | ST) y^- + 2 gist - F q (ST))P(q; t), 

Zoo 9 =i 



(3) 



where F Q (ST) denotes the qth predictor F q of s, as an explicit function 
of past pixels ST. An estimate of density function (3) is obtained by 
replacing P(q; t) in the above by its estimate, as described below. 

Let the number of times the gth source had been output in a local 
past region R t of N points neighboring (m, n) = t (Fig. 4) be denoted 
by n(q). Due to the nature of the source model, n(q) cannot be 
measured at the source output. However, a reasonable and computable 
approximation to it is given by the expectation E{n(q)\ST], where the 
expectation assumes a random selection of (m, n) and is over the 
density (w t ). By the quasi-stationarity of P(q; t), we then set 



P(q; t) = 



E{n(q)\ST) 

N 



which becomes (appendix) 



P(q;t)=±ZP(qj=q\ST), 



(4) 



(5) 



where P(qj = q \ ST) is the conditional probability that the jth pixel in 
R t (Fig. 4) was output by source q based upon a priori probabilities 
E{P[q; (m, n)]} of (1). Further manipulations (appendix) give 
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Fig. 4 — Illustration of region R,. This region consists of N pixels in a local past vicinity 
of pixel (m, n). Note that the numbering of coordinates j = 1, 2, • • • , N is arbitrary, as 
are the region boundaries. 
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P(q,t)=K^ q = (6a) 

KC N 
P(q;t)=—-Zg(e q (tJ)) 1 <<?<<?, (6b) 



where K satisfies 



2 P(r> t) = 1- (7) 

9=0 



Equation (5) interprets P(q; t) as an arithmetic average of a posteriori 
probabilities of q over region P,, and eq. (6) show how this average can 
be computed. The term e q (t; j) in (6b) is the difference between the 
actual value of the j th pixel in region R, and the predicted value of 
this pixel given by predictor F q and is therefore the implied value of 
the j th innovations variable in R, under the hypothesis that predictor 
q was in operation at the source. Explicitly, 

e q {fJ) = si-F q (S j n, (8) 

where s j t is the j th pixel in R, and S J t ~ is the set of pixels previous to 
si Equation (6b) estimates P(q; t), by summing the relative probabil- 
ities of the innovations implied under the hypothesis that source q was 
in operation over region R,. Note that if g( • ) has its peak at zero, then 
P(q; t), 1 < q < Q, will be large for those q corresponding to small 
prediction error e q (t, j) over the N point region. If none of the Q 
predictors is consistent with past local data, then all terms in the sum 
of (5b) will be small for 1 < q < Q, and the normalization in (6) will 
make P(0; t) large. Further description of (4) to (7) is included in the 
derivation in the appendix. 

The codec described in Section IV predicts s, by the estimated mean 
of predictable source outputs: 

2 F q (S7)P(q; t) 

*-=4 • o) 

2 P(q\ t) 

9=1 

An important characteristic of this prediction rule is its insensitivity 
to small variations in data ST regardless of the relative values of N 
and Q. This is in contrast to the covariance method in linear prediction 
described in a review paper by Makhoul 3 in which small sample size 
can lead to an ill-conditioned system of equations whose inversion is 
the adapted predictor. Since (9) is a weighted average of stable (and 
generally good) estimates F q , stability persists even for N < Q, and 
some thought indicates that the resulting prediction of s t works in an 
intuitively reasonable way even if N is only unity. 
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IV. THE CODEC 

In this section, a codec resulting from the source model is described. 
A block diagram of the encoder is shown in Fig. 5. The codec has been 
used to code pictures using two and three bits per pel. 

The encoder operates by forming Q estimates F q {Xj), 1 < q < Q, of 
source output S t based upon the previously reconstructed field ele- 
ments Xj. Estimates of source probabilities P(q; t) t < q < 1, are 
made with eq. (6) to (7) using previously reconstructed pixels X{ in 
place of S{. Estimates F q (X7) and probabilities P(q; t) are used to 
predict the next encoder input pixel s t , according to (9) and the most 
likely distribution of values s t according to (3), with .XT replacing S7 . 

The encoder has been implemented using an N = 4 point learning 
region Rt (Fig. 6) and Q = 6 predictors. The predictors used are given 
in Table I. 

Note that with these six predictors the form of predictor (7) can be 
any one of the most common fixed predictors used in intrafield coders. 
This varies over the picture so that the best predictor (or best weighted 
sum) considering the recent past will be used at each sample point. 

The pictures which were encoded consisted of 256 lines in two 
interleaved fields and 256 samples per line. The previous line elements 
were taken from the previous line in the same field. In this environ- 
ment, no advantage was obtained by including elements more than 
one line away in the estimates. Similarly, no visible improvement was 
obtained using elements that were more than two elements away on 
the same line. The slope estimator, F 5 , and the planer estimator, F%, 
were found to be particularly useful in the system which uses two bits 
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Fig. 5 — The encoder. 
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Fig. 6 — Four-point region i?, used by codec. 

Table I 



FAn, m) = x(n - 1, m). 

F 2 (n, m) = x(n — 1, m — 1). 

F 3 (n, m) = x(n, m — 1). 

F 4 (n, m) = x(n + 1, m - 1). 

F 5 (n, m) = 2x(n - 1, m) - x(n - 2, m). 

Fein, m) = jc(n — 1, m) + x(n, m — 1) — jc(n — 1, m — 1). 



per pel. These estimators allowed the coder to respond more quickly 
to edges within the picture, and reduced slope overload. 

Ideally, the quantizer should be adapted at each point to the esti- 
mated probability distribution of s,. In view of the complex form of (3), 
this type of redesign is not feasible, and the following ad-hoc curve- 
fitting technique was used to simplify the adaptation algorithm. The 
density function g(w) was taken as Laplacian, g(w) = a/2 exp(— a \w\). 
The Max quantizer 4 for this was determined. Each side of the distri- 
bution (3) about the mean s t was then approximated by an exponential 
distribution, and the axis was simply scaled appropriately in codec 
operation to place the quantization levels. The parameter of each 
exponential distribution was selected so that it had the same first 
moment about s, as the corresponding portion of the actual distribution 
as described in Fig. 7. 

When the estimated probability of occurrence of the random esti- 
mator is near zero and the estimators q = 1 through 6 are identical 
corresponding to a perfectly flat region in the picture, the parameter 
of the exponential defining the quantizer assumes its smallest value. 
In this situation, the parameter of the exponential defining the quan- 
tizer is approximately a, the parameter of the Laplacian distribution 
defining the innovation term in the model. Therefore, a determines 
the minimum values of the levels of the quantizer, and these, in turn, 
determine the amount of granularity due to quantization noise in flat 
regions of the picture and the ability of the coder to respond to 
unexpected edges. The smaller the value of a, the lower the granular 
quantization noise; the larger the value of a, the quicker the coder can 

ADAPTIVE INTRAFRAME DPCM CODEC 1403 



ce ^- 




y^. 




F 5 F 3 F 6 s t F 4 F 2 F, 

Fig. 7 — Illustration of encoder's derivation of predictor and quantizer. Fo refers to the 
distribution of the white source output; s, is the weighted sum predictor; m„ and m e are 
the upper and lower first moments of the actual distribution about s,; and c„ and c c are 
the exponential distributions used to determine the quantizer. 

respond to edges. Because of this interaction, the values of a were 
selected experimentally based upon visual examination of a sequence 
of coded pictures for the two- and three-bit/pel quantizers. For the 
two-bit/pel quantizer, a was selected so that the minimum value of 
the inner quantizer level is equal to two picture levels, when the picture 
is initially quantized into 256 levels. For the three-bit/pel quantizer, 
the inner quantization level was selected so that the inner quantization 
level is equal to one picture level. 

The random variable F in the source model of Fig. 2 is uniformly 
distributed over the range of possible values the sample can assume. 
In implementing the codec, it was found to be desirable to assume that 
the range of F is somewhat reduced. Limiting the span of F is 
particularly necessary in the system which transmits two bits per pel. 
This can be seen as follows. Assume that probability P(0, t) is estimated 
by the encoder to be close to unity. In this situation, if F has range 
[0, 255] the four quantization levels will be spread over the entire range 
of possible sample values. It is then likely that none of the estimators 
will be close to the reconstructed value x t even though an estimator 
can have closely approximated the actual value St. Thus, the random 
estimator may be used for the next sample. This creates an instability 
in the coder which can propagate into flat regions of the picture. To 
eliminate this type of instability, the maximum range of the quantizer 
was limited. To be consistent with limiting the maximum range of the 
quantizer, the span of Fo was limited to a symmetrical region about it 
of (9). The maximum span of the quantizer was also set experimentally. 
In the two-bit/pel system, the maximum span of the quantizer was set 
so that the inner level of the quantizer is eight picture levels. And in 
the three-bit/pel system, the maximum span of the quantizer was set 
so that the inner level in the quantizer is four picture levels. In the 
two-bit/pel system, there are only two quantization levels on each side 
of the predicted value. In this system, the maximum span of the 
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quantizer determined the ability of the encoder to track sudden 
changes in the picture. Therefore, it is necessary to make the maximum 
quantizer span as large as possible, without making the encoder 
unstable. In the three-bit/pel system, four quantization levels are on 
each side of the predictor. In this system, restricting the maximum 
quantizer span was necessary to prevent the quantizer span from 
frequently exceeding the range of possible picture levels and wasting 
quantization levels. This is why a smaller maximum value of the inner 
quantization level was selected for the three-bit/pel system than for 
the two-bit/pel system. 

In Figs. 8 and 9, the quantization span for various parts of the 
picture in the two- and three-bit/pel systems is shown. In these 
pictures, the average of the upper and lower quantization spans is 
displayed. The white areas correspond to the smallest span of the 
quantizer and the black levels the largest span. It is interesting to note 
that the resulting quantizer adaptation is similar to that which would 
be expected if a masking function were used. 5 However, this quantizer 
adaptation was arrived at strictly by mathematical techniques, mini- 
mizing the expected point mean-squared error with a varying proba- 
bility distribution of next sample values, rather than by the psychovi- 
sual considerations used to derive masking functions. This result is 




Fig. 8 — Quantizer range adaptation of the two-bit/pel codec. 
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Fig. 9 — Quantizer range adaptation of the three-bit/pel codec. 



consistent with Graham's early observations concerning the strong 
connection between image chaos and the visual system's tolerance to 
noise-like distortion. 2 

V. RESULTS 

This adaptive predictor with a two- and three-bit/pel quantizer has 
been implemented and compared with an adaptive predictor using 
Graham's rule 2 and the three-bit/pel fixed quantizer suggested in the 
Graham paper, and a previous element dpcm encoder with a fixed 
three-bit/pel quantizer. Our two-bit/pel adaptive predictor has consid- 
erably less slope overload than the previous element predictor having 
three bit/pel quantizer, but is not quite as good as the Graham 
predictor having a three bit/pel quantizer. Our adaptive predictor with 
a three bit/pel quantizer has less slope overload than the Graham 
predictor with a three bit/pel quantizer. In addition, the estimates at 
edges within the picture are accurate enough to virtually eliminate the 
edge business in moving sequences which is characteristic of many 
adaptive predictors. To demonstrate these characteristics, the differ- 
ence between the original picture of the checker girl, Fig. 1, and the 
result of processing by these four techniques is shown in Figs. 10 to 13. 
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Fig. 10 — Two-bit/pel codec performance. Top: Decoder output. Bottom: Difference 
between decoder output and original. 
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Fig. 1 1 — Three-bit/pel codec performance. Top: Decoder output. Bottom: Difference 
between decoder output and original. 
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Fig. 12 — Performance of three-bit Graham codec. Top: Decoder output. Bottom: 
Difference between decoder output and original. 
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Fig. 13 — Performance of three- bit previous element dpcm. Top: Decoder output. 
Bottom: Difference between decoder output and original. 

1410 THE BELL SYSTEM TECHNICAL JOURNAL, JULY-AUGUST 1979 



APPENDIX 

This appendix traces the development from eq. (4) to (7) of Section 
III. 

There are M = (Q + l) N possible vectors V = (q h q 2 , • • • qj, - ^n) 
of source options in the N point region R t of Fig. 4. Number these 
vectors i — 1, 2, • • • , M and let V, denote the ith vector. Define n(q | V,) 
as the number of components in V, that equal the specific value q. 
Then (4) becomes 



P(q; t)=E 



n(q) 



N 



(10a) 



1 M 
= t;S n(q\Vi)P(Vi\S7). (10b) 

In taking the expectation in (10a) we have treated t = (m, n) as a 
randomly chosen raster point for which P(q) = E[P(q; t)} of eq. (1) 
applies. Because the q are selected independently, P(V,|Sr) of (10b) 
is related to P(q) by 

i*(V.-|S,-)° J' ' P(V.) (lla) 

= ^mrtt p ^ (llb) 

where p(S7 | ( • )) denotes the probability density function of the vector 
of values in ST, and q# is the yth component of V». [The random 
selection of t cannot affect the independence of the components of V, 
in (lla); hence, (Hb)]. 
Substituting 

n(g|V,)=XV 9y (12) 

into (10b) and summing over i, (10b) becomes 

P(q;t)=^lP(q J = Q\S7), (13) 

where P(qj = q \ S7) is the conditional probability that the yth pixel in 
R, was generated by source q. Note that (13) gives P{q; t) as an average 
of a posteriori probabilities of q over the region R,, where 

i% = «|sr>=^ta>p< 9 ). (H) 

We now partition the set of pixels S7 into (i) a subset of pixels future 
to Sj but past to S, (call it S/); (ii) the pixel s 7 ; and (Hi) a subset of 
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pixels Si past to pixel s y . The elements in S/ when conditioned on s, 
and S{ do not depend upon qj, and it follows after straightforward 
manipulations that 

p(qj - q I st) = Kg( Sl - FjptoPyy, 1 < 9 = =£ Q, 

and 

P( 9y = 0|SD=T^P(0), (15) 

zoo 

where iiT is a normalizing constant. Substitution of eqs. (15) and (1) 
into (10) yields (5). 
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